Identifying stigmatizing language in clinical documentation: A scoping review of emerging literature

Background Racism and implicit bias underlie disparities in health care access, treatment, and outcomes. An emerging area of study in examining health disparities is the use of stigmatizing language in the electronic health record (EHR). Objectives We sought to summarize the existing literature related to stigmatizing language documented in the EHR. To this end, we conducted a scoping review to identify, describe, and evaluate the current body of literature related to stigmatizing language and clinician notes. Methods We searched PubMed, Cumulative Index of Nursing and Allied Health Literature (CINAHL), and Embase databases in May 2022, and also conducted a hand search of IEEE to identify studies investigating stigmatizing language in clinical documentation. We included all studies published through April 2022. The results for each search were uploaded into EndNote X9 software, de-duplicated using the Bramer method, and then exported to Covidence software for title and abstract screening. Results Studies (N = 9) used cross-sectional (n = 3), qualitative (n = 3), mixed methods (n = 2), and retrospective cohort (n = 1) designs. Stigmatizing language was defined via content analysis of clinical documentation (n = 4), literature review (n = 2), interviews with clinicians (n = 3) and patients (n = 1), expert panel consultation, and task force guidelines (n = 1). Natural language processing was used in four studies to identify and extract stigmatizing words from clinical notes. All of the studies reviewed concluded that negative clinician attitudes and the use of stigmatizing language in documentation could negatively impact patient perception of care or health outcomes. Discussion The current literature indicates that NLP is an emerging approach to identifying stigmatizing language documented in the EHR. NLP-based solutions can be developed and integrated into routine documentation systems to screen for stigmatizing language and alert clinicians or their supervisors. Potential interventions resulting from this research could generate awareness about how implicit biases affect communication patterns and work to achieve equitable health care for diverse populations.


Introduction
Racial and ethnic disparities in health care access, treatment, and outcomes have been documented for decades [1].Prior studies have shown that concerns expressed by Black patients are more likely to be dismissed or ignored than White patients [2].This differential treatment has been observed among Black and African American patients leading to disparities in outcomes, [1,3,4] and specifically in the treatment of cardiovascular diseases, [5] pain, [6] and breast cancer [7].Racism occurring on the structural, interpersonal, or cultural levels has been identified as the primary reason for disparities in health outcomes [8].Researchers have examined clinician biases by studying racial bias in patient-clinician interactions, finding that stereotyping and lack of empathy towards patients by race influenced health care outcomes [9].
Stigmatizing language has been defined as language that communicates unintended meanings that can perpetuate socially constructed power dynamics and result in bias [10].Recent studies suggest that racial biases may also be identified by examining stigmatizing language in clinician notes documented in the electronic health record (EHR) [11][12][13][14].Racial differences in documentation patterns may reflect unconscious biases and stereotypes that could negatively affect the quality of care [14].Examples of stigmatizing language may include the use of quotations to identify disbelief in what the patient is reporting, questioning patient credibility, sentence construction that implies hearsay, and the use of judgment words [13].Stigmatizing language in clinical notes has been associated with more negative attitudes towards the patient and less effective management of patient pain by physicians [14].

Objective
It is unknown to what extent and how stigmatizing language has been studied in healthcare settings, and study designs and foci differ.Emerging studies have used traditional qualitative methods, including interviews with patients and clinicians.Other research has used natural language processing (NLP), a computer science-based technique that helps extract meaning from large bodies of text, to quantify how EHR notes reflect stigmatizing language by race and ethnicity.The purpose of this scoping review was to identify, describe, and evaluate the presence and type of stigmatizing language in clinician documentation in the literature.

Design
A scoping review was chosen instead of a systematic review as the purpose was to identify and map the emerging evidence [15].This review was conducted using PRISMA-ScR guidelines for scoping reviews [16].

Search strategy
The authors discussed the selection and coverage of three concepts (i.e., stigmatizing language, clinician, and clinical documentation) for review based on the research question.For purposes of the current study, the concept of "clinician" includes physicians and nurses.We searched PubMed, Cumulative Index of Nursing and Allied Health Literature (CINAHL), and Embase databases in May 2022 to identify studies investigating stigmatizing language in clinical documentation.We also conducted an updated hand-search of the IEEE Explore database for articles published through April 2022.However, we did not identify additional articles that met inclusion criteria and were not already included in our review.The results for each search were uploaded into EndNote X9 software, de-duplicated using the Bramer method [17], and then exported to Covidence software for title and abstract screening.The search strategy is detailed in S1 Table.

Inclusion criteria
The initial search yielded 1,482 articles for review.After de-duplication, 897 articles were included for title and abstract screening.Two authors (BI, DS) independently screened all articles by title and abstract and documented reasons for exclusion, when applicable.Studies were included if they investigated stigmatizing language in clinical documentation.Studies that looked into stigmatizing language with patient-provider interaction that did not include documentation (e.g., verbal communication) were excluded.Articles not in English, review articles, editorials, commentaries, and articles without full-text availability were also excluded.The same reviewers independently assessed all potentially relevant articles in the full-text review to comprehensively determine eligibility for inclusion, as well as searching reference lists for additional articles.Discrepancies were discussed with the team to achieve consensus.From the 40 articles included for full-text review, nine articles were included for final synthesis (Fig 1).

Data extraction and quality assessment
Relevant information categories from each included article were extracted by two authors (BI, DS).Two other co-authors with expertise in health informatics (MT, HM) reviewed and validated all the extracted data elements.These information categories included: authors, year of publication, study aim and design, clinical setting, data source, clinician specialty, clinical note type (when available), study population, number of clinical notes used, data analysis approach, outcomes, and stigmatizing language identified.The Mixed Methods Appraisal Tool (MMAT) [18] was used to evaluate study quality and the risk of bias in the included articles.
Methods for measuring and defining stigmatizing language varied by study.Specifically, stigmatizing language was identified via interviews with clinicians [19,20,22] and patients, [19] content analysis of clinical documentation, [13,21,23,24] literature review, [11,12] expert panel consultation, [11] and task force guidelines from relevant professional  organizations [12].Definitions of stigmatizing language or bias varied as well by study, with most studies focusing on discipline-specific words communicating judgment or negative bias (Table 1).Stigmatizing language often included stereotyping by race and ethnicity.An example found in clinician documentation in the EHR was in the form of quotes highlighting "unsophisticated" patient language, i.e., ". ..patient states that the wound 'busted open'" [21].
Another study found that physician notes written about Black patients had up to 50% higher odds of containing evidentials (language used by the writer questioning the veracity of the patient's words) and stigmatizing language than those of White patients [13].Similarly, physicians documented more negative feelings such as disapproval, discrediting, and stereotyping toward Black patients than White patients [21].
Often, clinical documentation studied was in the form of clinical notes.The most commonly analyzed clinical notes included those documented by physicians (n = 3), [12,13,22] followed by nurses (n = 1), [24] advanced practice providers (n = 1), [12] and interdisciplinary team members including radiologists, respiratory therapists, nutritionists, social workers, case managers, and pharmacists (n = 1).Sun et al. examined history and physical notes written by medical providers, although no further detail about the type of providers was specified [11].Reporting of race and ethnicity of study participants varied widely.In three studies, race was not specified at all, [20,22,24] or studies reported only White and Black participant races (n = 2) [13,21].Two studies described findings by race and ethnicity, including Black (or African American), Hispanic, White, and Asian categories [12,23].The remaining studies either reported race and ethnicity as: White, Black or Hispanic, [11] or White or Hispanic [19].
Studies that conducted interviews focused on how clinical notes were written and may be interpreted by patients, [22] barriers and facilitators to providing care, [19] patients' perceptions of their hospitalization, [19] and clinician insights on racial bias and EHR documentation [20].Qualitative themes identified related to stigmatizing language included a reluctance to describe patients as "difficult" or "obese" due to the social stigma attached to common medical language, [22] intentional and unintentional perpetration of stigma in clinical notes, [19] and identification of potential racial bias through documentation [20].
In terms of methods, four studies used NLP [11][12][13]22] to extract terms from clinical notes matching those in predefined vocabularies of stigmatizing language terms.After NLP, statistical analyses were conducted to calculate and compare the odds of stigmatizing language occurrence among different patient populations.Two of the NLP-based studies used Linguistic Inquiry and Word Count (LIWC: a standardized vocabulary of terms), while others created their own hand-crafted vocabularies.One of the studies that involved the use of NLP [11] developed a machine learning classifier that would automatically detect stigmatizing language.This was the only study that measured the accuracy of automated NLP-based stigmatizing language detection and found it very accurate (F1 score = 0.94).
Despite a wide variety of clinical settings in the reviewed studies, negative language, bias, racial bias, or stigmatizing language was identified in clinician attitudes and/or documentation across all studies that could negatively impact patient perception or outcomes.Disparities in stigmatizing language use in the EHR were evident by race and ethnicity both in clinician interviews [20,22,24] and analyses of clinical notes [11-13, 19, 21, 23].There may be discipline-specific stigmatizing language and terms [i.e., addiction [19]] and paternalistic attitudes that state that clinical notes are for clinician communication and not for patients to read [i.e., oncology [22]] that warrant further investigation.
In Table 2, results of the study quality assessments are presented.All studies asked clear research questions and collected data to address the research questions.Among quantitative studies (n = 4), three met all five criteria for quality, and the remaining study did not adequately describe measurement, confounders, or intervention fidelity.The qualitative studies (n = 3) met the criteria for four of five quality components assessed, with two studies lacking an explicit discussion of the qualitative approach.Neither mixed methods studies (n = 2) met all quality criteria, as one did not include an adequate rationale for using this design, the other study did not discuss inconsistencies between quantitative and qualitative results, and both did not adhere to all criteria for quantitative and/or qualitative methods.

Discussion
In this review, we identified the types and frequency of stigmatizing language in EHR notes, establishing an underpinning for future research on the correlation between communication patterns and outcomes (i.e., hospitalization, mortality, complications, disease stability, symptom control).With continuous advancements in the field of NLP, we believe that these methods (including deep learning-based methods) will be essential tools in future stigmatizing language studies.
It is crucial to evaluate NLP-based system performance to ensure accurate concept identification and reliable results; however, this was only done in one study [11].Further studies that use NLP are needed that evaluate the accuracy of the resulting NLP systems and to ensure stigmatizing language is identified correctly.The two studies reviewed here that used NLP did not assess clinical relevance, limiting their findings.In addition to accurate stigmatizing language identification, clinical relevance must be assessed to determine to what extent NLP systems are useful for predicting the association between language use and clinical outcomes.Finally, there is a gap in the literature for NLP-specific bias assessment.There is a need for further development of NLP for identifying stigmatizing language, as these methods may not detect all stigmatizing language, and outcomes may be driven by the level of bias among annotators.Quality from training data is vital in algorithm development, and more research should be done describing biases of people performing annotation.This type of acknowledgment is increasingly common in journals where authors are required to submit positionality statements, however, we suggest that this go further for annotators, as life experiences influence assessments of whether bias or stigma is present.We did not do a specific evaluation of the NLPonly studies, due to the small number.However, further studies should be done to evaluate the quality of NLP studies and the validity of NLP results.Specific criteria for this domain should be developed.The identification of stigmatizing language use in EHR notes is vital as this language may foster the transmission of bias between clinicians and may represent a value judgment of the intrinsic worth assigned to a patient [11].Further, with the passage of The 21 st Century Cures Act in the US, federal policy now requires the availability of clinical notes to patients [25].Clinical notes that reflect clinician bias may harm the patient-clinician relationship and hinder or damage the establishment of trust required for positive interactions in health care settings.Medical mistrust is a persistent problem contributing to delays in seeking care and widening disparities in disease outcomes for many vulnerable populations, [26] hence efforts are needed to improve the current situation.
Definitions of stigmatizing language varied in the studies reviewed, and also represent an area for future research.Stigmatizing language may best be defined by the vulnerable populations at risk, in partnership with researchers.Further, discipline-specific language should be discussed and agreed upon, as this may vary by patient population.For example, guidelines have been suggested for addressing the intersectional nature of language in the care of birthing people [27].
Three studies reviewed here did not specify race or ethnicity of their clinician and patient participants [20,22,24].This is a significant issue as patient-clinician race discordance has been associated with increased risk of mortality [28].Racial concordance, however, does not necessarily lead to better communication as perceived by patients [29].Given the inconsistency in reporting of race and ethnicity in the reviewed studies, future research in this area should carefully operationalize and define race and ethnicity variables extracted from the EHR.In addition, studies whose primary focus was to identify bias did not blind for patient race, as in many cases race was considered a primary predictor or variable of interest.This underscores an important gap in the literature for NLP-specific bias assessment.Blinding sensitive categories when screening records for bias may improve validity of outcome ascertainment, however, it is often necessary for reviewers to rely on context and include categories such as race and ethnicity when evaluating for stigmatizing language.
The measurement of race is a contentious issue in many medical and scientific disciplines, and though it is a social construction with no biological basis, it remains an indicator of likelihood of encountering racism and racist structures that lead to health disparities.EHR demographic data have been shown to have several quality issues, with some studies indicating that data from Latinos having higher rates of misclassification than other racial/ethnic groups [30].It is important to consider who enters race and ethnicity data in the EHR, as patient self-identification is often used as the "gold-standard" in research, yet the patient's apparent phenotype may be an even more important predictor of clinician perception and subsequent clinical documentation.Indeed, recent work has identified that patient race can be predicted using machine learning algorithms applied to other clinical indicators from the EHR [31][32][33].From a validity and reliability perspective, researchers must align their methodological definition of race and ethnicity with the stated research objectives.Further, consistent definitions of racial and ethnic categories are essential to identifying associations between stigmatizing language use and patient outcomes as future studies developing interventions are considered.Future research should include larger proportions of minoritized patient and clinician participants to elucidate these issues further, and examine the underlying factors associated with poorer outcomes in various healthcare settings.Finally, six of the studies reviewed [12,13,[19][20][21][22] included physicians, and many included other health care provider types (i.e.nurses, respiratory therapists, pharmacists, etc.) either alone [24] or in addition to physician notes/participants [12,19,20].Limited information was provided about the type of notes that were analyzed.Further detail about the type of clinicians and notes would allow for the identification of what other disciplines are reading or writing to draw conclusions about the transmission of bias over the trajectory of patient care.
There are several opportunities for policy change to address the use of stigmatizing language in clinical documentation.First, stigmatizing language can be identified automatically with NLP.NLP-based solutions can be developed and integrated into routine documentation systems to screen for stigmatizing language and alert clinicians or their supervisors.Previously published instances of flags in EHR documentation have provided evidence of improved outcomes of care, including in diagnosis of stroke, increasing health care access for patients at risk of suicide, and improving community rates of Hepatitis C screening for those at high risk [34][35][36].To our knowledge, NLP findings of stigmatizing language use in the EHR has not yet been applied to clinical practice, identifying a need for future research that could lead to practice and policy change.
Second, clinicians' less than optimal working conditions may contribute to burnout and negative language use toward patients.One study found that resident physicians who reported higher levels of burnout had greater explicit and implicit racial biases [37].Individuallyfocused interventions for clinicians, such as mindfulness training, have also been suggested as a method to reduce bias in clinical care, [38] but have yet to be evaluated.A study carried out on nurses in Taiwan suggested that workplace burnout was associated with poorer patient care outcomes, though stigmatizing language was not examined [39].The COVID-19 pandemic has also contributed to moral injury for nurses, affecting patient care [40].Burnout does not foster an environment where clinicians can foster and sustain empathy for patients, and empathy is a critical component of reducing bias and building support for antiracism efforts to reduce inequities [41,42] Antiracism and bias efforts in hospitals should include analyzing if clinician burnout is associated with stigmatizing language use in EHR documentation, and if it reinforces bias between clinicians, potentially contributing to health inequities.
In summary, this review highlights a new and promising application of qualitative research and NLP to clinical documentation in the study of racial and ethnic disparities in health care.We suggest that further research be done applying NLP to identify stigmatizing language, with the ultimate goal of reducing clinicians' stigmatizing language use in health documentation.By improving identification of stigmatizing language through NLP and other methods, potential interventions can be developed to generate awareness and design educational interventions about how implicit biases affect communication patterns and work to achieve equitable health care for diverse populations.